智能论文笔记

Semi-supervised Deep Multi-view Stereo

Hongbin Xu , Zhipeng Zhou , Weitao Cheng , Baigui Sun , Hao Li , Wenxiong Kang

分类：计算机视觉 | 人工智能

2022-07-24

在受监督和无监督的设置的基于学习的多视图立体声（MV）中，已经看到了重大进展。为了结合其在准确性和完整性方面的优点，同时减少了对昂贵标签数据的需求，本文探讨了一种新型的基于学习的MVS问题的新型半监督设置，该设置只有MVS数据的一小部分与密集的深度地面真相相连。但是，由于方案和视图中灵活的设置的巨大变化，半监督的MVS问题（半MV）可能会破坏经典的半监督学习中的基本假设，该假设未标记的数据和标记的数据共享相同的标签空间和数据分布。为了解决这些问题，我们提出了一个新颖的半监督MVS框架，即SE-MVS。对于基本假设在MVS数据中起作用的简单情况，一致性正则化鼓励模型预测在原始样本和随机增强样品之间通过KL差异的限制保持一致。对于MVS数据中基本假设有冲突的进一步麻烦案例，我们提出了一种新型的样式一致性损失，以减轻分布差距引起的负面影响。未标记的样品的视觉样式被转移到标记的样品中以缩小差距，并且在原始标记的样品中使用标签进一步监督了生成样品的模型预测。 DTU，BlendenDMV，GTA-SFM和Tanks \＆Temples数据集的实验结果显示了该方法的出色性能。在骨干网络中使用相同的设置，我们提出的SE-MV优于其完全监督和无监督的基线。

translated by 谷歌翻译

Long-Tail Prediction Uncertainty Aware Trajectory Planning for Self-driving Vehicles

Weitao Zhou , Zhong Cao , Nanshan Deng , Xiaoyu Liu , Kun Jiang , Diange Yang

分类：人工智能

2022-07-02

自主驾驶的典型轨迹计划通常依赖于预测周围障碍的未来行为。近年来，由于其令人印象深刻的性能，基于深度学习的预测模型已被广泛使用。但是，最近的研究表明，在长尾驾驶场景分布之后，在数据集上训练的深度学习模型将遭受“尾巴”的大量预测错误，这可能会导致计划者的失败。为此，这项工作定义了预测模型不确定性的概念，以量化由于数据稀疏而导致的高错误。此外，这项工作提出了一个轨迹规划师，以考虑对更安全性能的这种预测不确定性。首先，由于培训数据不足而导致的预测模型的不确定性是由集成网络结构估算的。然后，轨迹规划师的设计目的是考虑预测不确定性引起的最坏情况。结果表明，在数据不足引起的预测不确定性下，提出的方法可以提高轨迹计划的安全性。同时，使用足够的数据，该框架不会导致过度保守的结果。这项技术有助于在现实世界的长尾数据分布下提高自动驾驶汽车的安全性和可靠性。

translated by 谷歌翻译

Towards Frame Rate Agnostic Multi-Object Tracking

Weitao Feng , Lei Bai , Yongqiang Yao , Fengwei Yu , Wanli Ouyang

分类：计算机视觉

2022-09-23

多对象跟踪（MOT）是最基本的计算机视觉任务之一，它有助于各种视频分析应用程序。尽管最近取得了有希望的进展，但当前的MOT研究仍仅限于输入流的固定采样帧速率。实际上，我们从经验上发现，当输入帧速率变化时，所有最新最新跟踪器的准确性都会急剧下降。对于更智能的跟踪解决方案，我们将研究工作的注意力转移到了帧速率不可知MOT（FRAMOT）的问题上。在本文中，我们建议使用定期培训计划（FAPS）的帧速率不可知的MOT框架，以首次解决FRAMOT问题。具体而言，我们提出了一个帧速率不可知协会模块（FAAM），该模块（FAAM）渗透并编码帧速率信息，以帮助跨多帧速率输入的身份匹配，从而提高了学习模型在处理FRAMOT中复杂的运动体验关系方面的能力。此外，FRAMOT中训练和推理之间的关联差距扩大，因为训练中未包含的那些后处理步骤在较低的帧速率方案中产生了更大的影响。为了解决这个问题，我们建议定期培训计划（PTS），以通过跟踪模式匹配和融合来反映培训中的所有后处理步骤。除了提出的方法外，我们首次尝试以两种不同的模式（即已知的帧速率和未知帧速率）建立这项新任务的评估方法，旨在处理更复杂的情况。在具有挑战性的MOT数据集（FRAMOT版本）上进行的定量实验清楚地表明，所提出的方法可以更好地处理不同的帧速率，从而改善对复杂情况的鲁棒性。

translated by 谷歌翻译

A Flexible Diffusion Model

Weitao Du , Tao Yang , He Zhang , Yuanqi Du

分类：机器学习 | 人工智能 | 计算机视觉

2022-06-17

扩散（基于得分）生成模型已被广泛用于建模各种类型的复杂数据，包括图像，音频和点云。最近，已经揭示了前向后的随机微分方程（SDE）和基于扩散的模型之间的深厚连接，并提出了几种新的SDE变体（例如，Sub-VP，批判性抑制的Langevin）。尽管手工制作的固定前进SDE取得了经验成功，但仍未探索大量适当的正向SDE。在这项工作中，我们提出了一个通用框架，用于参数化扩散模型，尤其是正向SDE的空间部分。引入了一种抽象的形式主义，并具有理论保证，并且它与以前的扩散模型的联系得到了利用。我们从优化的角度展示了我们方法的理论优势。还提出了关于合成数据集，矿工和CIFAR10的数值实验，以验证我们框架的有效性。

translated by 谷歌翻译

Eventor: An Efficient Event-Based Monocular Multi-View Stereo Accelerator on FPGA Platform

Mingjun Li , Jianlei Yang , Yingjie Qi , Meng Dong , Yuhao Yang , Runze Liu , Weitao Pan , Bei Yu , Weisheng Zhao

分类：计算机视觉

2022-03-29

事件摄像机是受到生物启发的视觉传感器，异步代表像素级亮度随着事件流而变化。基于事件的单眼多视图立体声（EMV）是一种利用事件流以估算具有已知轨迹的半密度3D结构的技术。对于基于事件的单眼大满贯，这是一项关键任务。但是，所需的密集计算工作负载使其对于嵌入式平台上的实时部署而具有挑战性。在本文中，通过实现最关键和最耗时的阶段，包括事件反向预测和FPGA上的体积射线计数，提出Eventor作为快速有效的EMV加速器。高度平行且完全管道的处理元素是通过FPGA专门设计的，并与嵌入式臂集成为异质系统，以改善吞吐量并减少记忆足迹。同时，通过重新安排，近似计算和混合数据量化，将EMVS算法重新制定为更硬件的方式。戴维斯数据集的评估结果表明，与英特尔i5 CPU平台相比，Eventor的能源效率最高可提高$ 24 \ times $。

translated by 谷歌翻译

SE(3) Equivariant Graph Neural Networks with Complete Local Frames

Weitao Du , He Zhang , Yuanqi Du , Qi Meng , Wei Chen , Bin Shao , Tie-Yan Liu

分类：人工智能 | 机器学习

2021-10-26

群体模棱两可（例如，SE（3）均衡性）是科学的关键物理对称性，从经典和量子物理学到计算生物学。它可以在任意参考转换下实现强大而准确的预测。鉴于此，已经为将这种对称性编码为深神经网络而做出了巨大的努力，该网络已被证明可以提高下游任务的概括性能和数据效率。构建模棱两可的神经网络通常会带来高计算成本以确保表现力。因此，如何更好地折衷表现力和计算效率在模棱两可的深度学习模型的设计中起着核心作用。在本文中，我们提出了一个框架来构建可以有效地近似几何量的se（3）等效图神经网络。受差异几何形状和物理学的启发，我们向图形神经网络介绍了局部完整帧，因此可以将以给定订单的张量信息投射到框架上。构建本地框架以形成正常基础，以避免方向变性并确保完整性。由于框架仅是由跨产品操作构建的，因此我们的方法在计算上是有效的。我们在两个任务上评估我们的方法：牛顿力学建模和平衡分子构象的产生。广泛的实验结果表明，我们的模型在两种类型的数据集中达到了最佳或竞争性能。

translated by 谷歌翻译

Cluster-guided Contrastive Graph Clustering Network

Xihong Yang , Yue Liu , Sihang Zhou , Siwei Wang , Wenxuan Tu , Qun Zheng , Xinwang Liu , Liming Fang , En Zhu

分类：机器学习

2023-01-03

Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.

translated by 谷歌翻译

Explaining Imitation Learning through Frames

Boyuan Zheng , Jianlong Zhou , Chunjie Liu , Yiqiao Li , Fang Chen

分类：机器学习 | 计算机视觉

2023-01-03

As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.

translated by 谷歌翻译

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

Zhongtao Chen , Chenghu Mi , Siwei Duo , Jingfei He , Yatong Zhou

分类：自然语言处理

2023-01-03

Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.

translated by 谷歌翻译

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Jie Liu , Yixiao Zhang , Jie-Neng Chen , Junfei Xiao , Yongyi Lu , Bennett A. Landman , Yixuan Yuan , Alan Yuille , Yucheng Tang , Zongwei Zhou

分类：计算机视觉 | 机器学习

2023-01-02

An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.

translated by 谷歌翻译